Overview

Dataset Statistics

Number of Variables 13
Number of Rows 13762
Missing Cells 30756
Missing Cells (%) 17.2%
Duplicate Rows 37
Duplicate Rows (%) 0.3%
Total Size in Memory 8.7 MB
Average Row Size in Memory 665.5 B
Variable Types
  • Numerical: 4
  • Categorical: 9

Dataset Insights

negative_reason has 5135 (37.31%) missing values Missing
negative_reason_confidence has 3873 (28.14%) missing values Missing
review_coordinates has 12790 (92.94%) missing values Missing
review_city has 4448 (32.32%) missing values Missing
user_timezone has 4510 (32.77%) missing values Missing
sentiment_confidence is skewed Skewed
negative_reason_confidence is skewed Skewed
thumbup_count is skewed Skewed
user_name has a high cardinality: 7422 distinct values High Cardinality
review_text has a high cardinality: 13429 distinct values High Cardinality
review_coordinates has a high cardinality: 801 distinct values High Cardinality
review_timestamp has a high cardinality: 6617 distinct values High Cardinality
review_city has a high cardinality: 2989 distinct values High Cardinality
user_timezone has a high cardinality: 84 distinct values High Cardinality
negative_reason_confidence has 1262 (9.17%) zeros Zeros
thumbup_count has 13048 (94.81%) zeros Zeros
  • 1
  • 2

Variables


review_id

numerical

Approximate Distinct Count 13620
Approximate Unique (%) 99.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 215.0 KB
Mean -4.5501e+14
Minimum 5.6759e+17
Maximum 5.7031e+17
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • review_id is skewed left (γ1 = -0.4783)

Quantile Statistics

Minimum 5.6759e+17
5-th Percentile 5.6781e+17
Q1 5.6856e+17
Median 5.6948e+17
Q3 5.6989e+17
95-th Percentile 5.7026e+17
Maximum 5.7031e+17
Range 2.7223e+15
IQR 1.3323e+15

Descriptive Statistics

Mean -4.5501e+14
Standard Deviation 7.7936e+14
Variance 6.0739e+29
Sum -6.2618e+18
Skewness -0.4783
Kurtosis -1.0493
Coefficient of Variation -1.7128
  • review_id is not normally distributed (p-value 0.003565052517435879)

user_name

categorical

Approximate Distinct Count 7422
Approximate Unique (%) 53.9%
Missing 0
Missing (%) 0.0%
Memory Size 1015.1 KB
  • The largest value (JetBlueNews) is over 1.87 times larger than the second largest value (kbosspotter)

Length

Mean 10.5339
Standard Deviation 2.6193
Median 11
Minimum 2
Maximum 19

Sample

1st row cairdin
2nd row jnardino
3rd row yvonnalynn
4th row jnardino
5th row jnardino

Letter

Count 135691
Lowercase Letter 119871
Space Separator 7
Uppercase Letter 15820
Dash Punctuation 0
Decimal Number 7417
  • user_name contains many words: 7425 words
  • The largest value (jetbluenews) is over 1.87 times larger than the second largest value (kbosspotter)

airline_sentiment

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 978.2 KB
  • The largest value (negative) is over 2.96 times larger than the second largest value (neutral)

Length

Mean 7.7885
Standard Deviation 0.4084
Median 8
Minimum 7
Maximum 8

Sample

1st row neutral
2nd row positive
3rd row neutral
4th row negative
5th row negative

Letter

Count 107186
Lowercase Letter 107186
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (negative, neutral) take over 50.0%
  • The largest value (negative) is over 2.96 times larger than the second largest value (neutral)

sentiment_confidence

numerical

Approximate Distinct Count 1011
Approximate Unique (%) 7.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 215.0 KB
Mean 0.9002
Minimum 0.335
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sentiment_confidence is skewed left (γ1 = -1.2413)

Quantile Statistics

Minimum 0.335
5-th Percentile 0.6444
Q1 0.6925
Median 1
Q3 1
95-th Percentile 1
Maximum 1
Range 0.665
IQR 0.3075

Descriptive Statistics

Mean 0.9002
Standard Deviation 0.1628
Variance 0.02652
Sum 12388.7587
Skewness -1.2413
Kurtosis 0.3056
Coefficient of Variation 0.1809
  • sentiment_confidence is not normally distributed (p-value 1.4031532508276112e-24)

negative_reason

categorical

Approximate Distinct Count 10
Approximate Unique (%) 0.1%
Missing 5135
Missing (%) 37.3%
Memory Size 685.5 KB
  • The largest value (Customer Service Issue) is over 1.74 times larger than the second largest value (Late Flight)

Length

Mean 16.3615
Standard Deviation 5.8492
Median 16
Minimum 9
Maximum 27

Sample

1st row Bad Flight
2nd row Can't Tell
3rd row Can't Tell
4th row Late Flight
5th row Bad Flight

Letter

Count 127893
Lowercase Letter 107286
Space Separator 12148
Uppercase Letter 20607
Dash Punctuation 0
Decimal Number 0

negative_reason_confidence

numerical

Approximate Distinct Count 1390
Approximate Unique (%) 14.1%
Missing 3873
Missing (%) 28.1%
Infinite 0
Infinite (%) 0.0%
Memory Size 154.5 KB
Mean 0.6392
Minimum 0
Maximum 1
Zeros 1262
Zeros (%) 9.2%
Negatives 0
Negatives (%) 0.0%
  • negative_reason_confidence is skewed left (γ1 = -0.6021)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0.3609
Median 0.6706
Q3 1
95-th Percentile 1
Maximum 1
Range 1
IQR 0.6391

Descriptive Statistics

Mean 0.6392
Standard Deviation 0.3309
Variance 0.1095
Sum 6320.8325
Skewness -0.6021
Kurtosis -0.6602
Coefficient of Variation 0.5177
  • negative_reason_confidence is not normally distributed (p-value 7.911550479985974e-17)

airline_name

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1011.9 KB

Length

Mean 10.2938
Standard Deviation 3.9655
Median 9
Minimum 6
Maximum 18

Sample

1st row EgyptAir
2nd row EgyptAir
3rd row EgyptAir
4th row EgyptAir
5th row EgyptAir

Letter

Count 136840
Lowercase Letter 117789
Space Separator 4823
Uppercase Letter 19051
Dash Punctuation 0
Decimal Number 0

review_text

categorical

Approximate Distinct Count 13429
Approximate Unique (%) 97.6%
Missing 0
Missing (%) 0.0%
Memory Size 2.0 MB
  • The largest value (#NAME?) is over 3.12 times larger than the second largest value (thank you!)

Length

Mean 89.2146
Standard Deviation 36.0258
Median 99
Minimum 3
Maximum 175

Sample

1st row What said.
2nd row plus you've added ...
3rd row I didn't today... ...
4th row it's really aggres...
5th row and it's a really ...

Letter

Count 965615
Lowercase Letter 909768
Space Separator 209454
Uppercase Letter 55847
Dash Punctuation 1782
Decimal Number 0
  • review_text contains many words: 12835 words
  • The largest value (i) is over 1.61 times larger than the second largest value (flight)

thumbup_count

numerical

Approximate Distinct Count 18
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 215.0 KB
Mean 0.08276
Minimum 0
Maximum 44
Zeros 13048
Zeros (%) 94.8%
Negatives 0
Negatives (%) 0.0%
  • thumbup_count is skewed right (γ1 = 33.6178)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 1
Maximum 44
Range 44
IQR 0

Descriptive Statistics

Mean 0.08276
Standard Deviation 0.7631
Variance 0.5823
Sum 1139
Skewness 33.6178
Kurtosis 1483.833
Coefficient of Variation 9.2198
  • thumbup_count is not normally distributed (p-value 4.734023996487799e-25)
  • thumbup_count has 714 outliers

review_coordinates

categorical

Approximate Distinct Count 801
Approximate Unique (%) 82.4%
Missing 12790
Missing (%) 92.9%
Memory Size 84.6 KB
  • The largest value ([0.0, 0.0]) is over 30.2 times larger than the second largest value ([40.64656067, -73.78334045])

Length

Mean 24.1409
Standard Deviation 6.1376
Median 27
Minimum 10
Maximum 28

Sample

1st row [40.74804263, -73....
2nd row [42.361016, -71.02...
3rd row [33.94540417, -118...
4th row [33.94209449, -118...
5th row [33.2145038, -96.9...

Letter

Count 0
Lowercase Letter 0
Space Separator 972
Uppercase Letter 0
Dash Punctuation 806
Decimal Number 16827
  • review_coordinates contains many words: 1600 words
  • The largest value (00) is over 60.4 times larger than the second largest value (7378334045)

review_timestamp

categorical

Approximate Distinct Count 6617
Approximate Unique (%) 48.1%
Missing 0
Missing (%) 0.0%
Memory Size 1.0 MB

Length

Mean 14.6705
Standard Deviation 0.47
Median 15
Minimum 14
Maximum 15

Sample

1st row 24/2/2015 11:35
2nd row 24/2/2015 11:15
3rd row 24/2/2015 11:15
4th row 24/2/2015 11:15
5th row 24/2/2015 11:14

Letter

Count 0
Lowercase Letter 0
Space Separator 13762
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 146848
  • review_timestamp contains many words: 1419 words

review_city

categorical

Approximate Distinct Count 2989
Approximate Unique (%) 32.1%
Missing 4448
Missing (%) 32.3%
Memory Size 727.5 KB

Length

Mean 13.4027
Standard Deviation 6.3904
Median 13
Minimum 1
Maximum 34

Sample

1st row Lets Play
2nd row San Francisco CA
3rd row Los Angeles
4th row Los Angeles
5th row 1/1 loner squad

Letter

Count 101599
Lowercase Letter 77073
Space Separator 12120
Uppercase Letter 24526
Dash Punctuation 377
Decimal Number 3344
  • review_city contains many words: 2447 words

user_timezone

categorical

Approximate Distinct Count 84
Approximate Unique (%) 0.9%
Missing 4510
Missing (%) 32.8%
Memory Size 784.1 KB
  • The largest value (Eastern Time (US & Canada)) is over 1.94 times larger than the second largest value (Central Time (US & Canada))

Length

Mean 21.7797
Standard Deviation 7.91
Median 26
Minimum 3
Maximum 27

Sample

1st row Eastern Time (US &...
2nd row Pacific Time (US &...
3rd row Central Time (US &...
4th row Pacific Time (US &...
5th row Pacific Time (US &...

Letter

Count 151507
Lowercase Letter 113741
Space Separator 28378
Uppercase Letter 37766
Dash Punctuation 15
Decimal Number 0

Interactions

Correlations

Missing Values